Visual Representations and Models: From Latent SVM to Deep Learning
نویسنده
چکیده
Two important components of a visual recognition system are representation and model. Both involves the selection and learning of the features that are indicative for recognition and discarding those features that are uninformative. This thesis, in its general form, proposes different techniques within the frameworks of two learning systems for representation and modeling. Namely, latent support vector machines (latent SVMs) and deep learning. In the framework of learning with latent SVMs, we mainly deal with deformable part models (DPM) and offer several methods to improve them. In order to evaluate the effectiveness of these methods, we conduct our experimentations on the task of object category classification and localization mainly using the Pascal VOC dataset. First, we propose various approaches to group the positive samples into clusters of visually similar instances. Given a fixed representation, the sampled space of the positive distribution is usually structured. We show that addressing this structure explicitly, improves the quality of feature selection for the positive class. The proposed clustering techniques include a novel similarity measure based on exemplar learning, an approach for using additional annotation, and augmenting latent SVM to automatically find clusters whose members can be reliably distinguished from background class. Moreover, we show that sharing samples among these clusters can smoothen the decision boundaries of a mixture-of-experts model (such as DPM) and improve the performance of the final mixture classifier. In another effort, a strongly supervised DPM is suggested to study how these models can benefit from privileged information. The extra information comes in the form of semantic parts annotation (i.e. their presence and location). And they are used to constrain DPMs latent variables during or prior to the optimization of the latent SVM. Its effectiveness is demonstrated on the task of animal detection. Finally, we generalize the formulation of discriminative latent variable models, including DPMs, to incorporate new set of latent variables representing the structure or properties of negative samples. Thus, we term them as negative latent variables. We show this generalization affects state-of-the-art techniques and helps the visual recognition by explicitly searching for counter evidences of an object presence. Following the resurgence of deep networks, in the last works of this thesis we have focused on deep learning in order to produce a generic representation for visual recognition. A Convolutional Network (ConvNet) is trained on a largely annotated image classification dataset called ImageNet with ∼ 1.3 million images. Then, the activations at each layer of the trained ConvNet can be treated as the representation of an input image. We show that such a representation is surprisingly effective for various recognition tasks, making it clearly superior to all the handcrafted features previously used in visual recognition (such as HOG in our first works on DPM). We further investigate the ways that one can improve this representation for a task in mind. We propose various factors involving before or after the training of the representation which can improve the efficacy of the ConvNet representation. These factors are analyzed on 16 datasets from various subfields of visual recognition. Dedicated to Sahar; for being there through it all.
منابع مشابه
Machine learning based Visual Evoked Potential (VEP) Signals Recognition
Introduction: Visual evoked potentials contain certain diagnostic information which have proved to be of importance in the visual systems functional integrity. Due to substantial decrease of amplitude in extra macular stimulation in commonly used pattern VEPs, differentiating normal and abnormal signals can prove to be quite an obstacle. Due to developments of use of machine l...
متن کاملOntology Learning with Deep Learning: a Case Study on Patient Safety Using PubMed
Traditional distributional semantic models (DSMs) like Latent Semantic Analysis (LSA) and Latent Dirichlet Allocation (LDA) derive representations for words assuming words occurring in similar contexts will have similar representations. Deep Learning has made feasible the derivation of word embeddings (i.e. distributed word representations) from corpora of billions of words applying neural lang...
متن کاملEMG-based wrist gesture recognition using a convolutional neural network
Background: Deep learning has revolutionized artificial intelligence and has transformed many fields. It allows processing high-dimensional data (such as signals or images) without the need for feature engineering. The aim of this research is to develop a deep learning-based system to decode motor intent from electromyogram (EMG) signals. Methods: A myoelectric system based on convolutional ne...
متن کاملLearning Deep Generative Models
Building intelligent systems that are capable of extracting high-level representations from high-dimensional sensory data lies at the core of solving many artificial intelligence–related tasks, including object recognition, speech perception, and language understanding. Theoretical and biological arguments strongly suggest that building such systems requires models with deep architectures that ...
متن کاملAn Adversarial Neuro-Tensorial Approach For Learning Disentangled Representations
Several factors contribute to the appearance of an object in a visual scene, including pose, illumination, and deformation, to mention a few. Each factor accounts for a source of variability in the data, while the multiplicative interactions of these factors emulate the entangled variability, giving rise to the rich structure of visual object appearance. Disentangling such unobserved factors fr...
متن کامل